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ABSTRACT 

In this note we revisit the Fisher information content of cosmological power spectra or two-point functions of Gaussian fields in 
order to comment on the assumption of Gaussian estimators and the use of parameter dependent covariance matrices for parameter 
inference in the context of precision cosmology. We discuss that despite the fact that the assumption of a Gaussian likelihood is 
motivated by the central limit theorem, it leads if used consistently to a Fisher information content that violates the Cramer-Rao 
bound, due to the presence of independent but artificial information from the parameter dependent covariance matrix. At any fixed 
multipole, this artificial term is shown to become dominant in the limit of a large number of correlated fields. While the distribution 
of the estimators does indeed tend to a Gaussian with a large number of modes, it is shown, however, that its Fisher information 
content does not, in the sense that their covariance matrix never carries independent information content, precisely because of the 
non-Gaussian shape of the distribution. We discuss in this light the use of parameter dependent covariance matrices with Gaussian 
likelihoods for parameter inference from two-point statistics. As a rule of thumb, Gaussian likelihoods should always be used with 
a covariance matrix fixed in parameter space, since only this guarantees that a conservative information content is assigned to the 
observables, as well as preventing the apparition of biases. 
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1. Introduction 



Starting from the second half of the nineti es (lJungman et al.l 
li996allbHTbgmarkll997l:lTegmark et al.l 19971) . the calculation of 
Fisher information matrices in order to understand quantitatively 
the constraining power of an experiment has become ubiquitous 
in cosmology, with its fundamental aspects now cov ered in 



cosmological textbooks dPodelspnl 2003 
1 1 and 6 respectively, e.g.) 



_ , j Durrdl2008l section 

or (' Heavensll2009r This is espe- 



cially true for experiments aimed at measuring power spectra of 
close to Gaussian fields, since in this case very handy analytical 
expressions can be obtained that can be applied in a variety of 
major cosmological subfields, such as for instance the CMB, 
galaxy clustering, weak lensing as well as their combination. 
Nevertheless, even applied to Gaussian variables. Fisher infor- 
mation matrices are not totally exempt of subtleties. In this 
note, we revisit the two different possible perspectives on the 
Fisher information content of such spectra. One starting point 
is often the assumption of Gaussian errors. We point out that 
this assumption of a multivariate Gaussian likelihood for power 
spectra estimators is not fully consistent for the purpose of 
understanding their information content, due to a term violating 
the Cramer-Rao inequality, that we show is not necessarily 
small. Too much information is therefore assigned to the spectra 
under this assumption. We show that we can understand why 
this term is artificial precisely from the non Gaussian properties 
of the estimators, and discuss the reasons why the usual formula. 



i.e. without this term, or setting the covariance matrix to be 
parameter independent, still gives the correct amount of infor- 
mation. Our considerations apply indifferently to the spectra 
or the real space two-point coiTelation function. Namely, since 
the correlation function and the power spectrum are linearly 
related, the assumption of Gaussian power spectra is equivalent 
to the assumption of Gaussian two-point functions. We then 
comment on the role of the model parameter dependence of 
the covariance matrix within the Gaussi an app roximation, as 
studied in (lEifler et al.ll2009l:lLabatie et al.ll2012l) . 

The note is built as follows : In section |2] the two com- 
mon approaches to the information content of spectra are 
discussed in details in the case of a single field. We clarify 
to what extent and why one is actually flawed, which is the 
source our comments on the use of Gaussian likelihoods and 
parameter dependent covariances. In section |3] we then turn 
to a correlated family of fields, where the violation of the 
Cramer-Rao inequality is shown to become substantial. We 
summarize and conclude in section |4] 

We recall first the specific form of the Fisher informa- 
tion matrix, defined for a probability density function p as 
Fap = {da In p dp In , a,l3 model parameters of interest, in the 
particular case of a multivariate Gaussi an distribution with mean 
vector p and covariance matrix dVogelev&Szalavl 119961: 
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da dp 



(1) 



Remember that the Fisher information matrix has all the prop- 
erties a meaningful measure of information on parameters must 
have, most importantly for us here the fact that any transforma- 
tion of the data can only decrease its Fisher information matrix, 
regardless of the data distribution and parameter posterior Thus, 
the Fisher information of the distribution of any estimator can 
only be lower than or equal to that of the data it is applied to. 

2. One field, gamma distribution 

Consider a zero mean isotropic homogeneous Gaussian random 
field, in euclidean space or on the sphere. It is well known 
that the Gaussianity of the field is equivalent to the fact that the 
Fourier or spherical harmonic coefficients are independent com- 
plex Gaussian variables, only constrained by the reality condi- 
tion. Equivalently, the real and imaginary parts of those coef- 
ficients form independent real Gaussian variables. Such fields 
are entirely described by their spectrum, and so the extraction 
of the spectrum from the data with the help of an estimator is a 
fairly natural way to proceed for inference on parameters of in- 
terest. We place ourselves on the sphere, adopting the spherical 
harmonic notation for convenience. With the set of «/,„ the har- 
monic coefficients, the model parameter dependent spectrum C/ 
is defined as 



(2) 



Standard, unbiased quadratic estimators can be written as a sum 
over the number of Gaussian modes available, as 



1 ' 

m=-l 



(3) 



We do not consider any source of observational noise, incom- 
plete coverage or any other such issue, which are irrelevant for 
the points of our discussion. At this point, there are two ways 
to approach the problem of evaluating its information content in 
the cosmological literature. The first - let us call this approach 
the 'field perspective' - first calculates the information content 
of the field itself (equal to that of the set of fl;,„'s), and then inter- 
prets this information as being the one of the spectrum. In this 
case, the information in the field is given by formula ([T]i, with 
zero mean vector and diagonal covariance matrix C;. 



2j-^ Ci da Ci dp 



(4) 



where the factor 21+1 accounts for the number of independent 
Gaussian variables at a given multipole /. The sum is in prac- 
tice restricted to the multipole range that will actually be mea- 
sured to obtain the information in the spectrum to be extracted. 
A very small sample of works in this approach are Tegmark et al. 
(ll997l) : lHu & Jainl(l2004 : lBernsteinl(l2009i) . This approach is ar- 
guably conceptually appealing, as it deals with the information 
content of the field itself, and does not require the definition of 
estimators nor the calculation of their covariance. However, for 
the same reasons, it is only indirectly connected to data analysis 
as it is not yet specified precisely how this information content 
is to be extracted. 



In the second approach - that we call the 'estimator perspective' - 
is defined first an estimator C/ for each C/ to be extracted, within 
some Iniin and l^ax (maybe with some bandwidth that we ignore 
here), and its covariance matrix S;;- = ^C/C/'^ - (ci^ (^'') 
culated. Then it is argued that due to the central limit theorem, 
the distribution of the estimator will be approximately Gaus- 
sian. In the case of spectra of Gaussian fields, this is very well 
founded, at least for small scales modes, since (|3]l is a large sum 
of well behaved identically distributed independent variables. 
Then, under this assumption of Gaussianity, their information 
content is given by equation ([T]l with mean vector this time the 
set of C/'s itself and (model parameter dependent) covariance 
matrix 2;;/ , 



Z'"'" dCi ^ , dCf 1 _ 
+ -Tr 
da " dp 2 



da dp 



(5) 



It is well known that for the estimator (|3) we have 2/// = 
6ii'2C^/(2l + 1). The Fisher information matrix, in the estimator 
perspective, reduces thus to 



2 ^ X/ da Ci dp 2 Ci da C, dp 

(6) 

Clearly, the first term in the estimator perspective corresponds 
to that of the field perspective. However, the second term, 
coming from the derivative of the covariance matrix, is new. 
That term is not enhanced by a (2Z H- 1) factor, and is therefore 
very subdominant at high I. It is either usually neglected, or the 
covariance matrix of the estimators is inconsistently taken to be 
parameter independent, and in these cases the two approaches 
give the same results . Some exposi ti ons using explicitly this 
persp ective include (iT egmarkI 119971: ISeo & EisensteinI l2003l 
I2007h. where the additional term is neglected, or the approach 
in (lDodelsonll2003L section 1 1.4.3), where the covariance matrix 
is treated as parameter independent. Wor ks where this term 
plays the main role are (Eifleretal. 2009; Labatie et al.ll20I2h . 
where the authors specifically study the impact of parameter 
dependent covariance matrices for parameter estimation using 
such Gaussian likelihoods. 

Beyond the question of the quantitative relevance of this 
additional term, its very appearance is however very disturbing. 
Under this arguably reasonable Gaussian assumption, our 
estimator Q is found to carry more information than the full 
field, even on the smallest scales. This obviously violates the 
most fundamental property of Fisher information, i.e. that 
information can only be at best conserved when transforming 
the data (in this case reducing the field to its spectrum), a fact 
esse ntially equivalen t to the celebrated Cramer-Rao inequality 
(iTegmark et al.|[T997i) . Something must clearly have gone wrong 
in the assumption of a Gaussian likelihood for our spectra. 

To understand what has happened, it is worth tracking the 
exact distribution and information content of the estimator (|3]l. 
Since they are independent at different I, we can work at a fixed 
I, and the total information content of these estimators will 
simply be the sum over / of the information of the estimator at 
fixed I. Under our assumptions, the estimator is a sum of squares 
of 2/ + 1 independent Gaussian variables, and its probability 
density function can be obtained with no difficulty. The exact 
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distribution is the gamma probability density function with 
shape parameter k and location parameter as follows 



p(C,\a,/3)^exp[-C,/e) 



with 



k:^ -(2/+1), 



2C 



21+1 



(7) 



(8) 



and where F is the gamma function. It is well known that the 
gamma distribution does indeed tend towards the Gaussian dis- 
tribution for large k, with mean ^ - k0 - Ci and variance 
o"^ = k0^ - 2C^/{21 + 1), as expected. However, its Fisher 
information content does not tend to that of the Gaussian. In 
our case, since only is parameter dependent, we have that the 
Fisher information in the estimator density function (|7) is 



d0d0 



dlnpjCi) 
80 



Since 5eln p = (C/ - k0)l0^, and da0 
with straightforward algebra 

2^^'+^V, da C, d/3- 



20daCi/Ci, we obtain 



(10) 



Summing over /, we recover the first term of (|6), but not the sec- 
ond. We have recovered the field perspective result (|4) at any / 
without the Gaussian assumption but with the exact distribution. 
It turns out that even though the variance of the gamma distri- 
bution is parameter dependent, it does not in fact contribute to 
the information. This can be seen as the following. Consider the 
information in the mean only of the estimator From the Cramer- 
Rao inequality this must be less than the total information. 



_i_dpdiJ_ ^ , 

cr^ da dp ~ 



(11) 



Plugging in the values for the mean and variance leads in fact to 
the result that the inequality is an equality, so that the mean of 
the estimator captures all of its information. 

In summary, the Gaussian approximation assumes the mean 
and the variance of the estimator are uncorrected, such that 
both contributes to the information, while for the exact gamma, 
they are degenerate in such a way that the variance does not 
cany independent information. Another way to see this, that 
we will use below when the exact form of the distribution will 
be less convenient, comes from the fact that de In p{Ci) is a first 
order polynomial in C/. It can be shown namely that the first n 
moments capture all the information precisely when da In p is a 
polynomial of order n (Cannon 2011). 

The fact that this function dg In p(C\) is correctly repro- 
duced by the Gaussian assumption with variance treated as 
fixed in parameter space has another interesting consequence 
that is relevant for cosmological parameter inference. Namely, 
performing parameter inference under that assumption does 
not shift maximum likelihood points de In p{Ci) - 0, since this 
function is identical to that of the true likelihood Therefore, 
no bias is introduced. This is no true anymore if the variance 
is treated as parameter dependent, where it is not difficult to 
see that the peak of the likelihood gets shifted by some amount 
decaying with I. 



3. Several fields 

It is instructive to see how these considerations generalize to a 
situation of a family of n jointly zero mean Gaussian correlated 
fields, where the analysis proceeds through the extraction of the 
spectra and cross spectra. In this case, the C/ of the above dis- 
cussion becomes a n x « (possibly complex) Hermitian matrix 



c' = c. 



(12) 



From the hermiticity property there are only n{n -H l)/2 non re- 
dundant spectra. Adequate estimators are defined by a straight- 
forward generalization of equation (|3), 



(13) 



While the estimators are still independent for different Ts, the 
different components at a given / are not. The information con- 
(9) tent of the set of aj^^^ in the field perspective is still given by for- 
mula ([T]i for zero mean Gaussian variables. Explicitly, at a given 



^'v/^ = ^(2/+l)Tr 



^' da^' dp 



In the estimator perspective, assuming the estimators C'f 
are jointly Gaussian, we have instead 



i<j,l«l=l 



dC'^ 
da 



dCf 



.iTr 
2 



da dp 



(14) 



(15) 



where the covariance matrix is 

S„„ = {C'/Cr) - C'/Cr = ^ (Cf Cf + Cf Cf ) . (16) 

While it m ay not be i mmediately obvious this time, it has been 
noted (e.g iHu & JainI (12004) ) that the first term in ([B) is rig- 
orously equivalent to the expression from the field perspective 
( fT4l ). The estimator perspective under the assumption of a multi- 
variate Gaussian distribution for C/ thus still violates the Cramer- 
Rao inequality due the presence of the second term. Since this 
term is not enhanced by a factor of 21 + 1 we expect it to be 
subdominant again. However, it is less true this time than in the 
one dimensional setting : using the explicit form of the inverse 
covariance matrix. 



-ij,ki ■ 



(2Z+l)(c-''*C-''^' + C,-''''C-''^') 



(17) 



one can derive with some lengthy but straightforward algebra the 
following expression for the violating term. 



iTr 

2 



da dp 



= ^(« + 2)Tr 



+ -Tr 

2 



idCi .dCi 



If" 

5« ' 









Tr 


^' da 





c 



dp 

-idQ 

dp 



(18) 



for any number n of fields. If n - 1, we recover indeed ([6}. 
While the term is still subdominant at high /, the situation is 
yet a bit less comfortable. The number of fields is not neces- 
sarily very small in cosmologically relevant situations, such as 
tomographic joint shear and galaxy densities analysis in redshift 
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slices, to which one may also add magnification, flexion fields, 
etc. Writing schematically n - N/Nbm, where A^bin is the number 
of bins and Nf the number of fields per bin, we have e.g. Nf - 3 
for the galaxy density and the two shear fields, Nf - 4 includ- 
ing magnification, A^/ = 8 adding hypothetically the four flexion 
fields, and so on. Comparing ( fT4l i and ( fTSl l. and neglecting the 
second term in ( fTsT i, we have that at 



(19) 



the Cramer-Rao violating term is actually still the dominant one. 
Note that this is still optimistic. Due to the product of two traces 
in the second term in (fTST l, one can expect roughly the same scal- 
ing with « as the first term. Thus, the correct / in ( fT9] l may gener- 
ically be closer to 



(20) 



From the discussion in section [2] we can easily guess what 
went wrong. Consider the information content of the means 
of the estimators exclusively. This is given for any probability 
density function by weighting the derivatives of the means with 
the inverse covariance matrix, and is thus equal to the correct, 
first term in (fTST l. Since already the means of the estimators do 
exhaust the information in the field, we can therefore already 
conclude that the total information content of the estimators 
must be equal to that of their means, and in particular that the 
covariance does not contribute to the information. As before, 
the second term in the estimator perspective is an artifact of 
the Gaussian assumption. It is interesting though to derive as 
above more explicitly why only the means carry information, 
from the shape of the joint probability density of the estimators. 
The remainder of this section sketches how this can be simply 
performed, leading to equation (|26] |. 

We restrict ourselves now for the sake of notation to the 
case of two fields, n - 2 , but the following argumentation holds 
for any n. The exact joint distribution for the three estimators 

w = (C," 
theory as 



C/ = (C}\Cp,Cj^), is given from the rules of probability 



I ^ ( 1 ' '1\ 

*I<J=1 V »!=-/ /' 



(21) 



where <5^ is the Dirac delta function. The average is over the joint 
probability density for the two sets of harmonic coeflicients a 



and fl? . Define the vector 

Ini 



Im 



a; = (fl/_,, ■ 



1 2 



,al). 



(22) 



Since the ai,„ are zero mean Gaussian variables with correlations 
as given in (fT2l l. this probability density function is given by 



— ^- — exp ( --a! ■ C, 'a/ 1 , 



with 



12/+I 



C}^ ■ I2/+1 



2/+1 



(23) 



(24) 



where I2/+1 is the unit matrix of size 21 + 1. Z{a,/3) is the nor- 
malization of the density for a, that does depend on the model 
parameters through the determinant of the Ci matrix. The inverse 
matrix Cf^ has the same block structure, with entries being those 



of In the following we are not really interested in keeping 
track of the exact value of the components of this matrix, but 
only that they are dependent on the model parameters. With the 
understanding that ' =: Z)/, we have thus, due to the sparse 
structure of the C7' matrix and the Dirac delta functions in (l2Tt . 



-i(2Ul)J]D-/c^ 



(25) 



i.j=U2 



Due to the presence of the Dirac delta functions, we can thus 
take the exponential ( l23b out of the integral in (I2TI 1. Writing 
explicitly the dependency of the different terms on C/ and the 
model parameters, we obtain the following form 



fiC,) 
Z(a,/3) 



exp 



~(2/+l) 



U=I,2 



(26) 



which generalizes the gamma distribution, equation (|7]i, in this 
multidimensional case . The factor /(C/) is what is left from the 
integral (ISTT i when the density for the set of «/,„ is taken out, i.e. 
the volume of the space spanned by the a;„,'s that satisfies the 
constraints set by the Dirac delta function. It is thus a factor that 
depends on C; but importantly for us not on the model parame- 
ters Q. The point of the representation (|26] | is that it is immediate 
that da In p{Ci) is a polynomial first order in the components of 
C/. Second order terms, corresponding to information within the 
covariance matrix never appear, however close to a Gaussian the 
exact density function might be. It follows that the total Fisher 
information matrix is always equal to that of the mean, even if 
we did not derive the exact shape of the distribution. 

4. Summary and conclusions 

We discussed two common perspectives (the 'field' and 'es- 
timator' perspectives) on the Fisher information content of 
cosmological power spectra, and why in the estimator perspec- 
tive the assumption of a Gaussian likelihood of the spectra 
estimators violates the Cramer-Rao inequality, assigning the 
estimators more information than there is in the full underlying 
fields. Under the assumption of Gaussianity of the estimators, 
their means and covariance matrix are artificially rendered 
uncorrected, creating an additional piece of information in 
their covariance, that we showed was inexistent by calculating 
the exact information content of the estimators true probability 
density function. We showed that this violating term can 
become dominant in the limit of a large number of fields. Using 
Gaussian likelihoods consistently, i.e. with paramete r dependent 
covar i ance matrices , as s tudied for example in (lEifler et al.l 
120091: iLabatie et all 1201 2|) . assigns therefore far too much 
information to the spectra in this regime, and should thus be 
avoided, as this allows tighter but artificial constraints on the 
parameters and can introduce biases as well. Both a slight 
tightening of the constraints on parameters as well as tiny shifts 
of the parameter posterior maximum are indeed observed in 
these works. It is thus important to realise that these two effects 
do not reflect an improvement over the method of treating the 
covariance matrix fixed, but should be considered spurious. 

In the estimator perspective of the derivation of the Fisher 
information matrix, the piece of information coming from the 
covariance matrix is usually neglected. This note clarifies why 



' The prefactors in (I26t can be obtained in closed form, l eading to the 
Wishart density function. See (iHamimeche & Lewisll200d . e.g.) 
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it should not be present in the very first place, and how the 
agreement between the field and estimator perspective can thus 
arguably be seen as an happy cancellation of two inconsisten- 
cies. It is interesting to note that the reason why we still find the 
exact result in the estimator perspective without this wrong piece 
is that this expression is also the exact Fisher information con- 
tent of the exact, for low / strongly non Gaussian, distribution of 
the estimators, the central limit theorem playing actually no role. 

The other lesson we can take from this work is that in 
general, when in doubt about the joint distribution of a set of 
estimators, a safe choice of information content is always that 
of their means exclusively, which requires only the knowledge 
of their covariance. Provided the covariance matrix is correctly 
chosen, one is indeed sure for any probability density function 
from the properties of Fisher information to make a conservative 
evaluation, that does not rely on any further assumptions on its 
shape. Thus, leaving apart the question of the very accuracy 
of the approximation itself, using a Gaussian likelihood with 
parameter independent covariance matrix, having the entire 
information in the means, while not entirely consistent remains 
a safe prescription in the sense that a conservative information 
content is always assigned to the estimators. 

Interestingly, the choice of a Gaussian distribution becomes 
motivated in this case not by the central limit theorem, but by 
the fact that a conservative information content is assigned to 
the observables. From the Cramer-Rao bound, the constraints 
on the parameters used using this assumption cannot be tighter 
than the ones allowed by the true distribution. It is however 
essential in this respect that the covariance matrix is treated as 
fixed in parameter space. This holds true for any observables 
originating from an arbitrary field distribution. 
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